A Novel Bangla Font Recognition Approach Using Deep Learning
Md. Majedul Islam, A K M Shahariar Azad Rabby, Nazmul Hasan, Jebun Nahar, Fuad Rahman
Accepted to be presented at IEMIS 2020: International Conference on Emerging Technologies in Data Mining and Information Security, 2nd - 4th July, 2020, Kolkata, India
Description
Font detection is an essential preprocessing step for printed character
recognition. In this era of computerization and automation, computer composed
documents such as official documents, bank checks, loan applications, visiting
cards, invitation cards, educational materials, etc. are used everywhere. Beyond just
editing and processing documents, converting documents from one format to another, such as an invitation card, billboards, etc., is another major application area
where a designer has to recognize the font details from the images. There is a lot of
re-search on automatic font detection published for high resource languages such as
English. Still, not much has been reported for a low resource language such as
Bangla. Bangla has a complex structure because of the use of diacritics, com-pound
characters, and graphemes. Furthermore, because of the popularity of digital, online
publications, there has been a recent surge of fonts in Bangla. Font detection can
also help analysts detect changes in font choices based on socio-political divides:
for example, consider that fonts common in Bangladesh may not be as popular
among Bangla publications in India. In this paper, we present a Convolutional Neural Network (CNN) approach for detecting Bangla fonts, using a space adjustment
method dependent on a Stacked Convolutional Auto-Encoder (SCAE). As part of
the work, we built a large corpus of printed documents consisting of 12,187 images
in 7 different Bangla fonts, forming a total of 77,728 samples by augmentations to
train and validate our model. Our pro-posed model achieves 98.73% average font
recognition accuracy in the validation set